Permanency Memories in Scene Depth Analysis

نویسندگان

  • Miguel Angel Fernández
  • José M. López-Valles
  • Antonio Fernández-Caballero
  • María T. López
  • José Mira Mira
  • Ana E. Delgado
چکیده

There are several strategies of how to retrieve depth information from a sequence of images, like depth from motion, depth from shading and depth from stereopsis. In this paper, we introduce a new method to retrieve depth based on motion and stereopsis. A motion detection representation helps establishing further correspondences between different motion information. This representation bases in the permanency memories mechanism, where jumps of pixels between grey level bands are computed in a matrix of charge accumulators. For each frame of a video stereovision sequence, the method fixes the right permanency stereo memory, and displaces the left permanency stereo memory by pixel on the epipolar restriction basis over the right one, in order to analyze the disparities of the motion trails calculated. By means of this functionality, for all possible displacements of one permanency memory over the other, the correspondences between motion trails are checked, and the disparities are assigned, providing a way to analyze the depths of the objects present in the scene. 1 Stereovision-based Depth Analysis In general, there are several strategies of how to retrieve depth information from a sequence of images, like depth from motion, depth from shading and depth from stereopsis. In this paper, we introduce a new method to retrieve depth based on motion and stereopsis. In a conventional stereoscopic approach, usually two cameras are mounted with a horizontal distance between them. Consequently, objects displaced in depth from the fixation point are projected onto image regions, which are shifted with respect to the image center. The horizontal component of this displacement can be used to determine the depth of the object. Due to the geometry of the optic system, and considering the epipolar constraint, it is thereby sufficient to restrict disparity analysis to the projection of corresponding linear segments in the left and right camera. In some approaches, the disparity is computed by searching the maximum of the cross correlation between image windows along the epipolar lines of the left and right image [1]. Similarly, this can be done by trying to match discernible image features. So far, many algorithms have been developed to analyze the depth in a scene. Brown et al. [2] describe a good approximation to all of them in their survey article. In many previous works, a series of restrictions are used to approach the correspondence problem. The most usual restriction is the disparity restriction, which considers that is not probable that there exist objects very close to the camera. The scene uses to be limited to a medium distance. This way, too high disparities are eliminated [3]. Koenderink and van Doorn [4] expressed the necessary theory in best initial works related to disparity restriction, and Wildes [5] implemented some of their ideas [6]. More recently, disparity in stereoscopy continues showing its great interest (e.g., [7], [8]). According to the correspondence techniques used, we may classify methods into correlation-based, relaxation-based, gradient-based, and feature-based. The main correlation-based technique is the area correlation technique. This bases in considering the image pixels intensity values as a bidimensional signal, where one of both images has been translated –disparity(e.g., [9]). The basic idea of relaxation techniques is that pixels to be set into correspondence perform “controlled estimations”. The method permits the re-organization of correspondences. In this kind of process, the correlation values of the neighbors of a pixel are of a great importance for the evaluation of the correspondence [10]. Methods based in the gradient or in the optical flow aim to determine local disparities between two images by formulating a differential equation that relates motion and luminance. It is convenient to study the gradient at all surrounding pixels. This way, the correspondence information is more accurate [11]. Techniques based in features limit to reliable features, such as contours or curves, at the analyzed regions (e.g., [12]). All these developments approach the depth analysis by different methods; but most of them have as a common denominator that they work with static images and not with motion information. In this paper, we have chosen as an alternative not to use direct information from the image, but rather the one derived from motion analysis. The system proposed uses as input the motion information of the objects present in the stereo scene, and uses this information to perform a depth analysis of the scene. 2 Motion Detection from Permanency Memories The input to our system is a pair of stereo image sequences. These sequences have been acquired by means of two cameras arranged in a parallel configuration. The parallel camera configuration is the most easy to work with due to its geometry. The central idea behind this approach is to transpose the spatially defined problem of disparity estimation into the temporal domain and compute the disparity simultaneously with the incoming data flow. This can be achieved realizing that in a well-calibrated fronto-parallel camera arrangement the epipolar lines are horizontal and thereby identical to the camera scan-lines. Thus, they will capture two similar, although not exactly equal, scenes. In case the images have been acquired in a convergent configuration, horizontal epipolar-lines can be obtained by imagerectification techniques [13] employed as a front-end to the algorithm. This way the results of the algorithm will be optimal. The motion analysis algorithm used in this work has already been proved in applications such as moving object shape recognition in noisy environments [14], moving objects classification by motion features such as velocity or acceleration [15], and in applications related to selective visual attention [16]. Motion analysis performs separately on both stereovision sequences in two phases. The first analysis phase is based in grouping neighboring pixels that have similar grey levels in closed and connected regions in an image frame. The method used is segmentation in grey level bands. This method consists in reducing the resolution of illumination levels of the image, obtaining this way a lower number of image regions, which potentially belong to a single object in motion. Let B(x,y,t) be the grey level band associated to pixel (x,y) at time instant t, GL(x,y,t) the grey level, n the number of grey level bands, and N the number of grey levels, then: B(x,y,t) = round(GL(x,y,t)*n/N +0.5) ; A detailed analysis of the features and performances of this segmentation method is described in [17]. Obviously, segmentation in grey level bands performs in parallel on each couple of images of the stereo sequence. Once the objects present in the scene are approximated in a broad way, the second phase has to detect possible motions of the segmented regions. Again, motion information of both video sequences that form the stereo pair is extracted. Motion detection is obtained from image pixels change in luminosity as the video sequence goes on through time. Motion in an image segmented in grey level bands is detected through the variation of the grey level band of the pixels. Notice that it is not that important that regions neither completely adjusts to the shape of the objects, nor that at a given moment two different objects appear overlapped in a same region. Consider that the proper relative motion of the objects will force those regions belonging to a same object to move in a uniform way, and those regions that hold different objects separate in the future. From motion detection, we now introduce a representation that may help to establish further correspondences between different motion information. This representation finds its basis in the permanency memories mechanism. Precisely, this mechanism considers the jumps of pixels between bands, and it consists in a matrix of charge accumulators. The matrix is composed of as many units in horizontal and vertical direction as pixels there are in an image frame. This way, a position (x,y) of the image is associated to a permanency memory charge unit. Initially all accumulators are empty; that is to say, their charge is zero. The charge in the permanency memory depends on the difference between the current and the previous images grey level band value. An accumulator detects differences between the grey level bands of a pixel in the current and the previous frame. When a jump between grey level bands occurs at a pixel, the charge unit (accumulator) of the permanency memory at the pixel’s position is completely charged (charged to the maximum charge value max). This is the way to record that motion has just been detected at this pixel. This complete charge is produced when there is a jump to superior bands as well as to inferior bands. Thus, charge units of the permanency memory are able to inform on the presence of motion of the associated pixels. After the complete charge, each unit of the permanency memory goes decrementing with time (in a frame-byframe basis) down to reaching the minimum charge value min, while no motion is detected, or it is completely recharged, if motion is detected again. This behavior is described by means of the following pseudo code lines, where again B(x,y,t) is the grey level band associated to pixel (x,y) at time instant t. dec is a fixed quantity, which is decremented to the instantaneous charge of each charge unit each time that a frame is analyzed and no motion is detected. Thus, this quantity shows the discharge velocity of the permanency memory. if (B(x,y,t) != B(x,y,t-1)) charge(x,y,t)= max; else charge(x,y,t) = max(min, charge(x,y,t-1) dec); Obviously, the evolution of charge in space depends on the velocity of the mobile in a direction. A slow mobile causes a short charge slope, as the object’s advance from pixel to pixel may last various frames. During this time elapsed all affected units are discharging. In this case, between the charge and discharge of a unit, the mobile covers a short distance. On the other hand, a quick mobile causes that various memory units charge at the same time, such that there will be many more units affected by this motion. Thus, in this second case, between the total charge and discharge of a unit of the memory the mobile covers many pixels. Figure 1 shows all these issues. Figure 1a and 1b show two images of a monocular sequence. The advance of a car may be appreciated, as well as a more slight movement of a pedestrian. In figure 1c you may observe the effect of these moving objects on the permanence memory. Fig. 1. Permanence memory: (a) one image of a sequence, (b) same perspective after some seconds, (c) motion trails as represented on the permanence memory. The difference between a quick object as the car, which is leaving a very long motion trail (from dark grey to white), and a pedestrian whose velocity is clearly slower and whose motion trail is nearly unappreciable with respect to the cars one, is presented. Thus, permanency memories enable representing the motion history of the frames that form the image sequence, that is to say, there is segmentation from the motion of the objects present in the scene. However, the dependency of the permanency memories from the segmentation in grey level bands imposes a restriction. The diminishment of the resolution in illumination levels produced by the segmentation in grey level bands does not exactly imply segmentation into objects. Some of the objects of the images are segmented into various regions, and physically distinct objects may be overlapped into a same region. Nevertheless, this issue is not that important when taking into account that our aim is to characterize motion of the objects and not their shape. 3 Disparity Analysis from Permanency Memories Motion-based segmentation explained in the previous section facilitates the correspondence analysis. Indeed, motion trails obtained through the permanency memories charge units are used to analyze the disparity between the objects in the stereo pair in a more easy and precise way. The set of all disparities between two images of a stereo pair is called the disparity map. The retrieval of disparity information is usually a very early step in image analysis. It requires stereotyped processing where each single pixel enters the computation. In stereovision, methods based on local primitives as pixels and contours may be very efficient, but they are too much sensitive to locally ambiguous regions, such as occlusions or uniform texture regions. Methods based on areas are less sensitive to these problems, as they offer an additional support to obtain correspondences of difficult regions in a more easy and robust way, or they discard false disparities. Although methods based on areas use to be computationally very expensive, we introduce a simple area-based method with a low computational cost. 3.1. Motion trails for disparity analysis on epipolar lines In order to explain our disparity analysis method we will start analyzing the process at the level of epipolar lines. The key idea is that a moving object causes two identical trails to appear in epipolar lines of the permanency stereo memories. The only difference relies in their relative positions, affected by the disparity of the object at each moment. In figure 2, the charge values in two corresponding superimposed epipolar lines of the memories are represented. In a parallel configuration as the one we have chosen, there will be no disparity in right and left image for objects that are in a great depth – imagine in the infinite. Nevertheless, when an object approaches to the central point of the base line, that is to say, between the two cameras, the object goes appearing more to the right on the left image and more to the left on the right image. This is precisely the disparity concept; the more close objects have a greater disparity than the more distant ones. Looking at figure 2 it is possible to analyze the motion of each one of the three objects present in the permanency memories from their motion trails. This initial analysis is independent of the epipolar constraint studied. You may observe that object “a”, which has a long trail and has his maximum charge towards the left, is advancing to the left at a high speed. Object “b”, with a shorter trail, is also advancing towards the same direction but at a slower velocity. Finally, object “c”, whose trail is inverted in horizontal, is moving to the right at a medium velocity, as shown by its trail. Also from figure 2, but now comparing between the motion trails in both epipolar lines, disparity is analyzed. Motion trail of object “b” presents a null disparity. Therefore, we can conclude that this trail corresponds to an object that is far away from the cameras. Remember that due to our parallel cameras configuration, pixels with a null disparity are located in the infinite. Object “a” has a little greater disparity. Finally, object “c“ offers the greatest disparity. Fig. 2. Disparity of permanency memories. This simple example draws three main conclusions. Firstly, in order to consider two motion trails to be correspondent, it must only be checked that both are equal enough in length and in discharge direction in epipolar lines of the permanency stereo memories. Secondly, we may affirm that, in order to analyze disparities, one possibility is to displace one epipolar line over the other one, until we get the exact point where both lines are completely superimposed. In other words, an epipolar line has to be displaced over the other until motion trails coincide. Of course, the right epipolar line can be displaced over the left or the left epipolar line over the right. When the motion trails coincide, the displacement value applied to the epipolar line is the disparity value. In third place, if we consider the representation of a mobile with a high velocity, various charge units of the permanence memories may charge simultaneously. This way, an object may correspond to various disparities. This is the reason why one single memory unit is not able to establish the disparity of an object. It is necessary to analyze the correspondence from the values of various units. The decision of all units has to validate the overall disparity value. The more efficient way to manage this is that each pixel chooses its disparity in such a way that the maximum of its neighboring units confirm the disparity. All these considerations tell us that the disparity analysis at epipolar line level consists in superimposing both epipolar lines with different relative displacements and in analyzing the correspondences produced in the neighborhood of each unit. The one displacement, which produces that a maximum number of surrounding elements confirm its correspondence, demonstrates to be the more trustful disparity value. 3.2. Motion trails for disparity analysis on permanency memories The generalization to global analysis on complete stereo images is not too complex. The idea consists in totally superimposing the two permanency stereo memories under study, and not only their epipolar lines. One of the memories will be displaced a b c Left permanency memory Right permanency memory over the other looking for motion trails that coincide in both x and y directions. Once the displacement where the coincidence of pixels of motion trails is maximum in size has been calculated, this value is assigned the disparity value. As we already have explained, the inputs to the system are the permanency memories of the right and left images of the stereo video sequences. When an object moves in the scene, the effect in both cameras is similar in the charge accumulated in the memory units. If little time has elapsed since an object moved, the charge will be close to the maximum value max in both permanency memories, and if a lot of time has elapsed since it moved, the charge would be much lower or even equal to the minimum value min in both memories. Thus, we may assume that units with equal instantaneous charge values in their permanency memories correspond to the same objects. Let us consider the scheme of figure 3, where all elements implicated in the disparity calculus are shown. In the upper part of figure 3, the two permanency memories to be put into correspondence may be observed. In the inferior part, the processing system formed by two unit layers is represented. Each layer stores the information corresponding to one of the permanency stereo memories. Thus, each layer has as many processing units as units there are in a permanency memory. Fig. 3. Disparity analysis processing. Both layers are situated one over the other, in such a way that units situated in the same rows and the same columns of the two layers are considered homologous. The idea is to superimpose the two permanency memories and then to displace pixel-bypixel one memory over the other, looking for correspondences. The implementation of this idea is explained by means of the following pseudo code:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stereovision Disparity Analysis by Two-Dimensional Motion Charge Map Inspired in Neurobiology

Up to date several strategies of how to retrieve depth information from a sequence of images have been described. In this paper a method that is inspired in Neurobiology and that turns around the symbiosis existing between stereovision and motion is introduced. A motion representation in form of a two-dimensional motion charge map, based in the so-called permanency memories mechanism is present...

متن کامل

Stereovision depth analysis by two-dimensional motion charge memories

Up to date several strategies of how to retrieve depth information from a sequence of images have been described. In this paper a method that turns around the symbiosis existing between stereovision and motion is introduced. The central idea behind our approach is to transpose the spatially-defined problem of disparity estimation into the spatial-temporal domain. Motion is analyzed in the origi...

متن کامل

Motion features to enhance scene segmentation in active visual attention

A new computational model for active visual attention is introduced in this paper. The method extracts motion and shape features from video image sequences, and integrates these features to segment the input scene. The aim of this paper is to highlight the importance of the motion features present in our algorithms in the task of refining and/or enhancing scene segmentation in the method propos...

متن کامل

Bidimensional Motion Charge Map for Stereovision Disparity Analysis

Up to date several strategies of how to retrieve disparity information from a sequence of images have been described. In this paper we introduce a method to retrieve disparity based on motion and stereovision. A motion representation in form of a bidimensional motion charge map, based in the so-called permanency memories mechanism is presented. For each pair of frames of a video stereovision se...

متن کامل

3D Scene and Object Classification Based on Information Complexity of Depth Data

In this paper the problem of 3D scene and object classification from depth data is addressed. In contrast to high-dimensional feature-based representation, the depth data is described in a low dimensional space. In order to remedy the curse of dimensionality problem, the depth data is described by a sparse model over a learned dictionary. Exploiting the algorithmic information theory, a new def...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005